Centralized Failure Injection for Distributed, Fault-Tolerant Protocol Testing
نویسندگان
چکیده
We describe a centralized approach to testing that distributed fault-tolerant protocols satisfy their safety and timeliness speciications in the presence of the very failures they are designed to tolerate. Cesium is a testing environment based on the centralized simulation of distributed executions and failures. Processes are run in a single address space while providing the appearance of a truly distributed execution. The human tester can force the occurrence of arbitrary failures and security attacks. The implementations under test are not instrumented for testing purposes, and their source codes need not be available. We prove that Cesium can execute exactly the set of runs feasible in the real distributed system being simulated. We also show that there are safety and timeliness properties in the speciica-tions of many existing distributed protocols that cannot be tested in practical distributed systems. All of these properties can, however, be accurately tested by Cesium without introducing any perturbation in test experiments.
منابع مشابه
Testing and Fault Injection of Distributed Protocols
A growing challenge confronting designers and implementors of safety-critical distributed systems is the evaluation and validation of dependability requirements. This paper address the problem of testing fault-tolerance capabilities of distributed protocols. It introduces a general framework for fault injection and testing of distributed systems and it describes an ongoing development of a tool...
متن کاملA Leader Election Protocol for Fault Recovery in Asynchronous Fully-Connected Networks
We introduce a new algorithm for consistent failure detection in asynchronous systems. Informally, consistent failure detection requires processes in a distributed system to distinguish between two diierent populations: a fault free population and a faulty one. The major contribution of this paper is in combining ideas from group membership and leader election, in order to have an election prot...
متن کاملFault Tolerance Multi Agent co-ordination: A petri net based approach
As technology shifts from centralized computing to distributed computing and then to ubiquitous computing, the users are more dependent on the computer system for task delegation. Here autonomous agent and Multi Agent System (MAS) plays an important role to perform the task delegated by the user. As the fault in MAS is not-deterministic in nature, so designing fault tolerant MAS is a challengin...
متن کاملTesting of Fault-Tolerant and Real-Time Distributed Systems via Protocol Fault Injektion
As software for distributed systems becomes more complex, ensuring that a system meets its prescribed specification is a growing challenge that confronts software developers. This is particularly important for distributed applications with strict dependability and timeliness constraints. This paper reports on ORCHESTRA, a portable fault injection environment for testing implementations of distr...
متن کاملRedundancy Analysis and a Distributed Self-Organization Protocol for Fault-Tolerant Wireless Sensor Networks
Sensor nodes in a distributed sensor network can fail due to a variety of reasons, e.g., harsh environmental conditions, sabotage, battery failure, and component wear-out. Since many wireless sensor networks are intended to operate in an unattended manner after deployment, failing nodes cannot be replaced or repaired during field operation. Therefore, by designing the network to be fault-tolera...
متن کامل